19 research outputs found
Concept-driven visualization for terascale data analytics
Over the past couple of decades the amount of scientific data sets has exploded. The science community has since been facing the common problem of being drowned in data, and yet starved of information. Identification and extraction of meaningful features from large data sets has become one of the central problems of scientific research, for both simulation as well as sensory data sets. The problems at hand are multifold and need to be addressed concurrently to provide scientists with the necessary tools, methods, and systems. Firstly, the underlying data structures and management need to be optimized for the kind of data most commonly used in scientific research, i.e. terascale time-varying, multi-dimensional, multi-variate, and potentially non-uniform grids. This implies avoidance of data duplication, utilization of a transparent query structure, and use of sophisticated underlying data structures and algorithms.Secondly, in the case of scientific data sets, simplistic queries are not a sufficient method to describe subsets or features. For time-varying data sets, many features can generally be described as local events, i.e. spatially and temporally limited regions with characteristic properties in value space. While most often scientists know quite well what they are looking for in a data set, at times they cannot formally or definitively describe their concept well to computer science experts, especially when based on partially substantiated knowledge. Scientists need to be enabled to query and extract such features or events directly and without having to rewrite their hypothesis into an inadequately simple query language. Thirdly, tools to analyze the quality and sensitivity of these event queries itself are required. Understanding local data sensitivity is a necessity for enabling scientists to refine query parameters as needed to produce more meaningful findings.Query sensitivity analysis can also be utilized to establish trends for event-driven queries, i.e. how does the query sensitivity differ between locations and over a series of data sets. In this dissertation, we present an approach to apply these interdependent measures to aid scientists in better understanding their data sets. An integrated system containing all of the above tools and system parts is presented
Solvent Mediated Assembly of Nanoparticles Confined in Mesoporous Alumina
The controlled self-assembly of thiol stabilized gold nanocrystals in a
mediating solvent and confined within mesoporous alumina was probed in situ
with small angle x-ray scattering. The evolution of the self-assembly process
was controlled reversibly via regulated changes in the amount of solvent
condensed from an undersaturated vapor. Analysis indicated that the
nanoparticles self-assembled into cylindrical monolayers within the porous
template. Nanoparticle nearest-neighbor separation within the monolayer
increased and the ordering decreased with the controlled addition of solvent.
The process was reversible with the removal of solvent. Isotropic clusters of
nanoparticles were also observed to form temporarily during desorption of the
liquid solvent and disappeared upon complete removal of liquid. Measurements of
the absorption and desorption of the solvent showed strong hysteresis upon
thermal cycling. In addition, the capillary filling transition for the solvent
in the nanoparticle-doped pores was shifted to larger chemical potential,
relative to the liquid/vapor coexistence, by a factor of 4 as compared to the
expected value for the same system without nanoparticles.Comment: 9 pages, 9 figures, appeared in Phys. Rev.
Scalable Data Servers for Large Multivariate Volume Visualization
Volumetric datasets with multiple variables on each voxel over multiple time steps are often complex, especially when considering the exponentially large attribute space formed by the variables in combination with the spatial and temporal dimensions. It is intuitive, practical, and thus often desirable, to interactively select a subset of the data from within that high-dimensional value space for efficient visualization. This approach is straightforward to implement if the dataset is small enough to be stored entirely in-core. However, to handle datasets sized at hundreds of gigabytes and beyond, this simplistic approach becomes infeasible and thus, more sophisticated solutions are needed. In this work, we developed a system that supports efficient visualization of an arbitrary subset, selected by range-queries, of a large multivariate time-varying dataset. By employing specialized data structures and schemes of data distribution, our system can leverage a large number of networked computers as parallel data servers, and guarantees a near optimal load-balance. We demonstrate our system of scalable data servers using two large time-varying simulation datasets
Accepted for the Council:
I am submitting herewith a dissertation written by Markus Glatter entitle
Terascale data organization for discovering multivariate climatic trends
Current visualization tools lack the ability to perform full-range spatial and temporal analysis on terascale scientific datasets. Two key reasons exist for this shortcoming: I/O and postprocessing on these datasets are being performed in suboptimal manners, and the subsequent data extraction and analysis routines have not been studied in depth at large scales. We resolved these issues through advanced I/O tech-niques and improvements to current query-driven visualiza-tion methods. We show the efficiency of our approach by analyzing over a terabyte of multivariate satellite data and addressing two key issues in climate science: time-lag anal-ysis and drought assessment. Our methods allowed us to reduce the end-to-end execution times on these problems to one minute on a Cray XT4 machine